Multiword Expression Filtering For Building Knowledge

نویسندگان

  • Shailaja Venkatsubramanyan
  • Jose Perez-Carballo
چکیده

This paper describes an algorithm that can be used to improve the quality of multiword expressions extracted from documents. We measure multiword expression quality by the “usefulness” of a multiword expression in helping ontologists build knowledge maps that allow users to search a large document corpus. Our stopword based algorithm takes ngrams extracted from documents, and cleans them up to make them more suitable for building knowledge maps. Running our algorithm on large corpora of documents has shown that it helps to increase the percentage of useful terms from 40% to 70% – with an eight-fold improvement observed in some cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiword Expression Filtering for Building Knowledge Maps

This paper describes an algorithm that can be used to improve the quality of multiword expressions extracted from documents. We measure multiword expression quality by the “usefulness” of a multiword expression in helping ontologists build knowledge maps that allow users to search a large document corpus. Our stopword based algorithm takes n-grams extracted from documents, and cleans them up to...

متن کامل

Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies

In this paper, we present an algorithm for extracting translations of any given multiword expression from parallel corpora. Given a multiword expression to be translated, the method involves extracting a short list of target candidate words from parallel corpora based on scores of normalized frequency, generating possible translations and filtering out common subsequences, and selecting the top...

متن کامل

Multiword Sequences as Building Blocks for Language: Insights into First and Second Language Learning

Many grammatical frameworks view words and rules as the basic building blocks of language, with multiword sequences being treated as peripheral exceptions in the form of idioms, etc. (e.g., Pinker, 1999). The new millennium, however, has seen a shift toward construing multiword sequences not as linguistic rarities but as important building blocks for language acquisition and processing. Based o...

متن کامل

MULTILINGUAL MULTIWORD EXPRESSIONS Literature Survey

Multiword Expressions are idiosyncratic word usages of a language which often have noncompositional meaning. The knowledge of multiword expressions is necessary for many NLP tasks like, machine translation, natural language generation, named entity recognition, sentiment analysis etc. In order for other NLP applications to benefit from the knowledge of multiword expressions, they need to be ide...

متن کامل

Building an Arabic Multiword Expressions RepositoryBuilding an Arabic Multiword Expressions RepositoryBuilding an Arabic Multiword Expressions RepositoryBuilding an Arabic Multiword Expressions RepositoryBulding an Arabic Multiword Expressions Repository

We introduce a list of Arabic multiword expressions (MWE) collected from various dictionaries. The MWEs are grouped based on their syntactic type. Every constituent word in the expressions is manually annotated with its full context-sensitive morphological analysis. Some of the expressions contain semantic variables as place holders for words that play the same semantic role. In addition, we ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004